Improving Document Ranking with Dual Word Embeddings
نویسندگان
چکیده
This paper investigates the popular neural word embedding method Word2vec as a source of evidence in document ranking. In contrast to NLP applications of word2vec, which tend to use only the input embeddings, we retain both the input and the output embeddings, allowing us to calculate a different word similarity that may be more suitable for document ranking. We map the query words into the input space and the document words into the output space, and compute a relevance score by aggregating the cosine similarities across all the query-document word pairs. We postulate that the proposed Dual Embedding Space Model (DESM) provides evidence that a document is about a query term, in addition to and complementing the traditional term frequency based approach.
منابع مشابه
A Dual Embedding Space Model for Document Ranking
A fundamental goal of search engines is to identify, given a query, documents that have relevant text. This is intrinsically difficult because the query and the document may use different vocabulary, or the document may contain query words without being relevant. We investigate neural word embeddings as a source of evidence in document ranking. We train a word2vec embedding model on a large unl...
متن کاملWord embeddings and Global Preference for Contextual Suggestion
In this paper we describe our effort on 2016 Contextual Suggestion Track. We present a new ranking model that captures both global trend of interests as well as contextual individual preference. We trained a regressor on common users data thus it can prioritize popular places and categories. In order to model individual user preference, we introduce word embeddings to represent both user profil...
متن کاملUtilizing Embeddings for Ad-hoc Retrieval by Document-to-document Similarity
Latent semantic representations of words or paragraphs, namely the embeddings, have been widely applied to information retrieval (IR). One of the common approaches of utilizing embeddings for IR is to estimate the document-to-query (D2Q) similarity in their embeddings. As words with similar syntactic usage are usually very close to each other in the embeddings space, although they are not seman...
متن کاملLeveraging word embeddings for spoken document summarization
Owing to the rapidly growing multimedia content available on the Internet, extractive spoken document summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document to concisely express the most important theme of the document, has been an active area of research and experimentation. On the other hand, word embedding has emerged as a newly fa...
متن کاملSemi-Supervised Information Retrieval System for Clinical Decision Support
This article summarizes the approach developed for TREC 2016 Clinical Decision Support Track. In order to address the daunting challenge of retrieval of biomedical articles for answering clinical questions, an information retrieval methodology was developed that combines pseudo-relevance feedback, semantic query expansion and document similarity measures based on unsupervised word embeddings. T...
متن کامل